Multi - Stream Approach in Acoustic
نویسنده
چکیده
May 1997. MULTI-STREAM APPROACH IN ACOUSTIC MODELING Sangita Tibrewala1 and Hynek Hermansky1;2 1Oregon Graduate Institute of Science and Technology, Portland, Oregon, USA. 2International Computer Science Institute, Berkeley, California, USA. Email: sangita,[email protected] ABSTRACT In this paper we present the general framework of the multi-stream approach for automatic speech recognition (ASR). The multi-band model which is our speci c implementation of the multi-stream approach is then described along with experiments which demonstrate the robustness of this model. Finally we introduce our current work which is based on applying the multi-stream model in the modulation spectrum domain. 1. THE MULTI-STREAM CONCEPT In conventional ASR systems, all the elements of the feature vector are treated as one entity by the classier. As a result, localized degradation of only a few elements results in a degraded feature vector which is then often misclassi ed. This suggests that allowing for independent treatment of particular elements of the feature vector may facilitate isolation of localized unreliable elements. Also, in conventional ASR systems, features are extracted on a short-time basis (typically every 10ms). The common assumption of the subsequent HMMmodeling is that the feature vectors are independent across time. However, there is evidence which suggests that it may be bene cial to utilize existing correlations between neighboring feature vectors, by looking at longer (about syllable length) segments of speech. For example: Robust features like RASTA [3] and delta features which extract features by considering about 50ms to 200ms of signal around the current frame. Psychoacoustic phenomena like temporal masking demonstrate that the perception of sounds in humans is in uenced by about 200ms of the preceding sound. Recently, data-driven lters were designed using the linear discriminant analysis (LDA) on the Switchboard task [4]. The resulting FIR lters were 1 sec in length with dominant weighting around 200-300 ms. Therefore we advocate a shift in the current ASR paradigm from the assumption of dependency among the elements of the feature vector to the assumption of relative independency between them and from the assumption of independency of feature vectors across time to the assumption of syllable length time dependencies between them. ➭ independency in time de pe nd en cy b et w ee n
منابع مشابه
Using the Multi Stream Approach for Continuous Audio Visual Speech Recognition Experiments on the M Vts Database
The Multi Stream automatic speech recognition approach was investigated in this work as a framework for Au dio Visual data fusion and speech recognition This method presents many potential advantages for such a task It particularly allows for synchronous decoding of continuous speech while still allowing for some asynchrony of the visual and acoustic information streams First the Multi Stream f...
متن کاملMulti-tape finite-state transducer for asynchronous multi-stream pattern recognition with application to speech
In this thesis, we have focused on improving the acoustic modeling of speech recognition systems to increase the overall recognition performance. We formulate a novel multi-stream speech recognition framework using multi-tape finite-state transducers (FSTs). The multi-dimensional input labels of the multi-tape FST transitions specify the acoustic models to be used for the individual feature str...
متن کاملRobust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling
One method to achieve robust speech recognition in adverse conditions including noise and reverberation is to employ acoustic modelling techniques involving neural networks. Using long short-term memory (LSTM) recurrent neural networks proved to be efficient for this task in a setup for phoneme prediction in a multi-stream GMM-HMM framework. These networks exploit a self-learnt amount of tempor...
متن کاملUncertainty driven Compensation of Multi-Stream MLP Acoustic Models for Robust ASR
In this paper we show how the robustness of multi-stream multi-layer perceptron (MLP) acoustic models can be increased through uncertainty propagation and decoding. We demonstrate that MLP uncertainty decoding yields consistent improvements over using minimum mean square error (MMSE) feature enhancement in MFCC and RASTA-LPCC domains. We introduce as well formulas for the computation of the unc...
متن کاملOn using Articulatory Features for Discriminative Speaker Adaptation
This paper presents a way to perform speaker adaptation for automatic speech recognition using the stream weights in a multi-stream setup, which included acoustic models for “Articulatory Features” such as ROUNDED or VOICED. We present supervised speaker adaptation experiments on a spontaneous speech task and compare the above stream-based approach to conventional approaches, in which the model...
متن کامل